home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Shareware Overload Trio 2
/
Shareware Overload Trio Volume 2 (Chestnut CD-ROM).ISO
/
dir28
/
st-size.zip
/
PC-SIZE.DOC
next >
Wrap
Text File
|
1992-07-06
|
34KB
|
1,004 lines
PC-SIZE
A Program for Sample Size Determinations
Version 2.13
(c) 1985, 1986
"One of many STATOOLS(tm)..."
by
Gerard E. Dallal
54 High Plain Road
Andover, MA 01810
PC-SIZE determines the sample size requirements for single
factor experiments, two factor experiments, randomized blocks
designs, and paired t-tests. In generic F mode, PC-SIZE can
determine sample sizes for any experiment in which the power
at the alternative is given by a non-central F distribution
with fixed numerator degrees of freedom, denominator degrees
of freedom that are linear in the sample size, and a non-
centrality parameter that is proportional to the sample size.
PC-SIZE can determine the sample size needed to detect a non-
zero population correlation coefficient when sampling from a
bivariate normal distribution. It can also be used to obtain
the common sample size required to test the equality of two
proportions. PC-SIZE can calculate the power of specific
sample sizes as well as determine the sample size needed to
achieve specific power.
NOTICE
Copyright 1985 and 1986 by Gerard E. Dallal. The pair of
PC-SIZE programs is shareware. Please see the notice in the
documentation for PC-SIZE: Consultant.
Please acknowledge PC-SIZE in any manuscript that uses its
calculations.
PAGE 2
DISCLAIMER
STATOOLS are provided "as is" without warranty of any kind.
The entire risk as to the quality, performance, and fitness
for intended purpose is with you. You assume responsibility
for the selection of the program and for the use of results
obtained from that program.
TABLE OF CONTENTS
Features.................................................. 2
Installation.............................................. 3
Operation................................................. 4
Specifying the design................................. 4
Specifying the alternative............................ 4
Generic F mode........................................ 5
Initial approximation................................. 5
Correlation coefficient................................... 5
Proportions............................................... 6
Paired t-test............................................. 7
Other applications........................................ 8
Two sample t-test..................................... 8
Two period cross-over design.......................... 8
Comparing a single sample to a known standard......... 8
Power of specific sample sizes............................ 8
Non-centrality parameters................................. 8
Validation................................................ 11
Algorithms................................................ 16
References................................................ 16
Sample size tables for the correlation coefficient........ 18
FEATURES
1. Flexibility:
Query system for single factor, two factor, randomized
blocks designs and paired t-tests.
Generic Mode permits sample size calculations for many
PC-SIZE G.E. Dallal
PAGE 3
problems in which the power at the alternative is
given by the non-central F distribution.
2. Portability: PC-SIZE is written in FORTRAN 77, but not
too far from the 66 standard. To make PC-SIZE run on a
VAX, for example, all you need do is modify the I/O unit
numbers (contained in a single DATA statement) and an
OPEN statement.
3. PC-SIZE will calculate the power of a specific sample
size as well as the sample size required to achieve
specific power.
4. Calculations may be saved in a designated output file.
5. Double precision calculations are used throughout.
6. Quantities contained in square brackets at the prompts
are default values which can be obtained by pressing the
return key. Default values are updated with the latest
entry for each quantity, thereby simplifying the task of
requesting a number of sample size calculations that
share many of the same specifications.
7. Trailing decimal points may be omitted or included as you
wish.
INSTALLATION
PC-SIZE is written for the IBM-PC. Installation on a new
computer may entail modifying the following statements:
The first DATA statement:
IIN -- input unit number (screen)
IOUT -- output unit number (screen)
IWOUT -- save file unit number
NMAX0 -- large integer constant (the
largest sample size that can
be considered)
The OPEN statement for the save file just before statement
10.
PC-SIZE G.E. Dallal
PAGE 4
OPERATION
Operation begins with the user specifying the level of the
test and the power required at the alternative. PC-SIZE will
report the number of observations per cell, per group (in the
case of proportions), or per randomized block.
Specifying the Design
Single factor designs: The user is prompted for the number
of groups.
Two factor designs: The user is prompted for the number of
levels of each factor. (Estimates are based on the main
effects of factor A. Use generic F mode to base estimates on
the interaction structure.) The user can then indicate
whether an interaction term will be present in the model and
the ANOVA table. (A * B * (N - 1) denominator degrees of
freedom, where 'A' and 'B' are the number of levels of the
two factors, if interaction is present; A*B*N - A - B + 1
denominator degrees of freedom, if not.)
Randomized blocks designs: The user is prompted for the
number of levels of the treatment factor. PC-SIZE calculates
the number of blocks needed to achieve the desired power
assuming each block receives one complete set of treatments.
Paired t-tests: The user is prompted for the expected
difference and the standard deviation of the differences.
Specifying the Alternative
In the cases of single factor, two factor, and randomized
blocks designs, the user is given three options for
specifying the alternative at which the power is to be
evaluated:
1. Specifying the individual effects. PC-SIZE automatically
centers the effects about zero. It is not necessary to
subtract the mean from each effect before entry.
2. Specifying a range (a single number) for the effects.
The minimum and maximum effects are assumed to occupy the
PC-SIZE G.E. Dallal
PAGE 5
endpoints of the range with the remaining effects
distributed uniformly throughout.
3. Specifying the average squared effect (where, for this
option, the mean has been subtracted from each effect
before squaring) divided by the error variance.
Generic F Mode
Generic mode requires more sophistication on the part of the
user but is capable of handling a wide variety of problems,
specifically, any problem for which the power at the
alternative is given by a non-central F distribution with
fixed numerator degrees of freedom, denominator degrees of
freedom that are linear in the sample size, and a non-
centrality parameter that is a multiple of the sample size.
(Non-centrality parameters are discussed below.) The user is
prompted for the numerator degrees of freedom, the linear
function that defines the denominator degrees of freedom, and
the multiple of the sample size that defines the non-
centrality parameter.
Initial Approximation
PC-SIZE invokes a "large sample approximation" (using a non-
central chi-square power function in place of the non-central
F) to get a rough estimate the necessary sample size. The
power is calculated at increments of 1 if the estimate is
less than 500, 10 if the estimate is between 500 and 5000,
100 if the estimated is between 5000 and 50000, and so on.
The calculations start at the large sample estimate less 5%
or a count of 10, whichever is greater, rounded to the
nearest increment, and continue until the required power is
obtained. The correlation coefficient and proportions are
handled differently--see below.
CORRELATION COEFFICIENT
This mode is used when sampling from a bivariate normal
population, neither of the two variables having its values
fixed prior to sampling. PC-SIZE will calculate the sample
size needed to carry out a two-tailed test of the hypothesis
PC-SIZE G.E. Dallal
PAGE 6
that the population correlation coefficient is 0. The user
is prompted for a non-null value of the coefficient.
Note: The distribution of the sample correlation coefficient
when the population value is non-zero is obtained through
numerical integration using Simpson's Rule with some bells
and whistles to speed up convergence. Ordinates of the
density function are calculated recursively, resulting in an
execution time that is proportional to sample size.
PC-SIZE reports the power of the test for sample sizes 3,
(2**K: K=2,3,...) successively until the required power is
exceeded. A binary search is them carried out (with
intermediate results NOT reported) to locate the minimum
adequate sample size. If the sample size is large, the
binary search can consume large amounts of execution time.
The Tables at the end of this document, produced by PC-SIZE,
give the necessary sample size for tests of power
0.50(0.10)0.90, 0.95 at levels 0.05 and 0.01 for underlying
population correlation coefficients of 0.05, 0.10(0.10)0.90.
PROPORTIONS
PC-SIZE uses formulas 3.18 and 3.19 of Fleiss(1981) to
determine the common sample size for a test of the equality
of two proportions. This estimate is a large sample
approximation based on standard normal theory. The user is
prompted for the values of the proportions under the
alternative to equality.
Equal sample sizes: In some instances the values produced by
PC-SIZE will be 1 greater than those in Fleiss's Table A.3.
Fleiss has apparently taken the values produced by the
formulae and rounded to the nearest integer. PC-SIZE reports
the smallest integer not less than the the results of the
formulae.
Unequal sample sizes: The user specifies the ratio of sample
2 to sample 1. Calculations are driven by sample 1. The
estimate for sample size 2 is obtained by multiplying sample
1's size by the specified ratio and reporting the smallest
integer no less than this value. This procedure can lead to
situations where (1) the estimated sample sizes are not
precisely in the proportions specified and (2) where
PC-SIZE G.E. Dallal
PAGE 7
switching the samples' labels and inverting the ratio will
produce slightly different estimates. For example, (cf.
Fleiss,1981,p.45): size of test 0.05, power at alternative
0.95:
P1 P2 RATIO GROUP1 GROUP2
0.25 0.40 0.50 531 266
0.40 0.25 2.00 266 532
Use the smallest sample size consistent with the specified
ratio that contains the estimates produced by PC-SIZE.
PAIRED T-TEST
PC-SIZE asks for the expected difference and the standard
deviation of the differences. Often, a researcher will have
some idea of the variances of the individual responses but
not of variance of the difference. In that case, estimate
the correlation of the responses and use the relation
var(X - Y) = var(X) + var(Y)
- 2 * corr(X,Y) * SQRT(var(X)*var(Y)) .
If the variances of the two responses are equal, the relation
reduces to
var(X - Y) = var(X) * 2 * (1 - corr(X,Y)) .
PC-SIZE G.E. Dallal
PAGE 8
OTHER APPLICATIONS
Two Sample t-test
This is a single factor analysis of variance with two groups.
Two period cross-over design
The two period cross-over design can be treated as a paired
t-test with one fewer error degrees of freedom than for the
paired t-test based on the same total number of observations.
Proceed as for a paired t-test, obtaining a sample size of
'n'. For each sequence (AB, BA), take (n+1)/2 observations
if 'n' is odd, 1+n/2 if n is even.
Comparing a Single Sample to a Known Standard
Use the paired t-test mode setting the "expected difference"
to the expected difference between the unknown population
mean and the known standard. Set the "estimate of standard
deviation of difference" to the estimated population standard
deviation.
POWER OF SPECIFIC SAMPLE SIZES
PC-SIZE will perform power calculations for specific sample
sizes as well as determine the sample size required to
achieve specific power. If the requested power is an integer
greater than or equal to 1, PC-SIZE starts its power
calculations at a sample size equal to the requested power.
The user is prompted for an increment and a stopping value.
NON-CENTRALITY PARAMETERS
Different authors use different definitions of the non-
centrality parameter of the non-central F distribution. The
differences typically involve a square root, a factor of
(numerator degrees of freedom + 1), and/or a factor of 2.
PC-SIZE G.E. Dallal
PAGE 9
PC-SIZE follows the notation of Kendall and Stuart(1973,
pp.237,262): The sum of the squares of "d" independent
normal variables with arbitrary means and unit variances is
said to follow a non-central chi-square distribution with "d"
degrees of freedom and non-centrality parameter equal to the
sum of the squared means. The ratio of a non-central chi-
square variable with "d1" degrees of freedom and non-
centrality parameter "lambda", divided by "d1", to an
independent central chi-square variable with "d2" degrees of
freedom, divided by "d2", is said to follow a non-central F
distribution with "d1" numerator degrees of freedom, "d2"
denominator degrees of freedom, and non-centrality parameter
"lambda". Scheffe(1959,p.414) defines his non-centrality
parameter to be the square root of this quantity.
Following Graybill(1961, Theorem 11.16), a non-centrality
parameter can be obtained as the numerator degrees of freedom
times (the difference between the numerator expected mean
square and the error variance) divided by the error variance.
It is assumed that the error variance is given by the
expected mean square of the denominator of the F-ratio.
The following notation is used throughout this section:
ALPHA -- level of the test
POWER -- power at the alternative
K -- number of effects under test
(number of groups, levels,...)
F1 -- numerator degrees of freedom
F2 -- denominator degrees of freedom
AVGESQ -- average squared effect divided by
the error variance
LAMBDA -- non-centrality parameter
N -- sample size
EVAR -- error variance (often within cell)
EFF(I) -- the I-th of the effects under test
[ AVGESQ = (SUM(EFF(I)**2) / K) / EVAR ]
1. Single Factor Experiment (K Groups):
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * K * AVGESQ
PC-SIZE G.E. Dallal
PAGE 10
2. Two Factor Experiment (Factor A -- "A" levels; Factor B
-- "B" levels):
Main effects for Factor A:
LAMBDA = N * B * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
Two factor interaction:
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
3. Randomized blocks designs (Single treatment factor at K
levels):
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * K * AVGESQ
4. Simple linear regression: E(Y(i)) = C0 + C1 * X(i)
(N observations at each X(i), i=1,...,p, with mean 0)
LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR
5. Quadratic regression:
E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2
H0: C1 = C2 = 0:
LAMBDA=
N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2 * SUM(X(i)**3
+ C2**2 * SUM(X(i)**4) / EVAR
H0: C2 = 0
LAMBDA = C2**2 * SUM(X(i)**4)
PC-SIZE G.E. Dallal
PAGE 11
VALIDATION
PC-SIZE was validated by applying it to all of the examples
from sections 3.2 through and including 3.6 of Odeh and Fox
(1975) which were reproduced with the following exceptions:
example 3.3.1 (main effects for A with no interaction in the
model): OF estimate 3. PC-SIZE calculates the power of a
sample of size 3 to be 0.79896 (<0.80). 4 are needed.
example 3.5.2 (test of quadratic regression term): OF
estimate 40. PC-SIZE calculates the power of a sample of
size 40 to be 0.94796 (<0.95). 41 are needed.
example 3.6.2 (multivariate t-test): OF estimate 100. PC-
SIZE calculates the power of a sample of size 100 to be
0.99484 (<0.995). 101 are needed.
The values of the arguments and the resulting sample size
estimates from PC-SIZE are:
Single Factor Experiment
(K Groups)
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * K * AVGESQ
Example 3.2.1:
ALPHA = 0.05 POWER = 0.80 K = 2
F1 = 1 F2 = 2 * (N - 1)
AVGESQ = 2 LAMBDA = 4 * N N = 4
Example 3.2.2:
ALPHA = 0.025 POWER = 0.70 K = 3
F1 = 2 F2 = 3 * (N - 1)
AVGESQ = 1/3 LAMBDA = 1 * N N = 11
Example 3.2.3:
ALPHA = 0.01 POWER = 0.975 K = 6
F1 = 5 F2 = 6 * (N - 1)
AVGESQ = 2/3 LAMBDA = 4 * N N = 9
PC-SIZE G.E. Dallal
PAGE 12
Two Factor Experiment
(Factor A -- "A" levels; Factor B -- "B" levels)
Main effects for Factor A:
LAMBDA = N * B * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
A * B interaction:
LAMBDA = N * SUM(EFF(I)**2) / EVAR
= N * A * B * AVGESQ
where EFF(i),i=1,...,A*B are the interaction terms.
Example 3.3.1:
Main effects for A with interaction in model:
ALPHA = 0.05 POWER = 0.80 A = 3
F1 = 2 F2 = 6 * (N - 1) B = 2
AVGESQ = 2/3 LAMBDA = 4 * N N = 4
Main effects for A with no interaction in model:
ALPHA = 0.05 POWER = 0.80 A = 3
F1 = 2 F2 = 6 * N - 4 B = 2
AVGESQ = 2/3 LAMBDA = 4 * N N = 4
Test for interaction (Use generic mode):
ALPHA = 0.05 POWER = 0.90 K = 6
F1 = 2 F2 = 6 * (N - 1)
AVGESQ = 1/2 LAMBDA = 3 * N N = 5
Example 3.3.2:
Main effects for A with interaction in model:
ALPHA = 0.005 POWER = 0.60 A = 4
F1 = 3 F2 = 16 * (N - 1) B = 4
AVGESQ = 1 LAMBDA = 16 * N N = 2
PC-SIZE G.E. Dallal
PAGE 13
Main effects for A with no interaction in model:
ALPHA = 0.005 POWER = 0.60 A = 4
F1 = 3 F2 = 16 * N - 7 B = 4
AVGESQ = 1 LAMBDA = 16 * N N = 2
Test for interaction (Use generic mode):
ALPHA = 0.10 POWER = 0.60 K = 16
F1 = 9 F2 = 16 * (N - 1)
AVGESQ = 1/8 LAMBDA = 2 * N N = 5
Example 3.3.3:
Main effects for A with interaction in model:
ALPHA = 0.01 POWER = 0.70 A = 2
F1 = 1 F2 = 6 * (N - 1) B = 3
AVGESQ = 1 LAMBDA = 6 * N N = 3
Main effects for A with no interaction in model:
ALPHA = 0.01 POWER = 0.70 A = 2
F1 = 1 F2 = 6 * N - 4 B = 3
AVGESQ = 1 LAMBDA = 6 * N N = 3
Test for interaction (Use generic mode):
ALPHA = 0.001 POWER = 0.90 K = 6
F1 = 2 F2 = 6 * (N - 1)
AVGESQ = 1/2 LAMBDA = 3 * N N = 10
Randomized blocks designs
(Single treatment factor at K levels)
LAMBDA = N * SUM(EFF(I)**2) / EVAR
LAMBDA = N * K * AVGESQ
PC-SIZE G.E. Dallal
PAGE 14
Example 3.4.1(i):
ALPHA = 0.05 POWER = 0.90 K = 3
F1 = 2 F2 = 2 * (N - 1)
AVGESQ = 2/3 LAMBDA = 2 * N N = 8
Example 3.4.1(ii): multiple treatment factors
use generic mode
ALPHA = 0.05 POWER = 0.90 A = B = 3
F1 = 2 F2 = 8 * (N - 1)
AVGESQ = 2/3 LAMBDA = 6 * N N = 3
Example 3.4.2: multiple treatment factors
use generic mode
ALPHA = 0.001 POWER = 0.95 A = B = 2
F1 = 1 F2 = 12 * N - 2 K = 1,...,6*N
AVGESQ = 1 LAMBDA = 24 * N N = 2
Example 3.4.3: multiple treatment factors
use generic mode
ALPHA = 0.025 POWER = 0.70 A = 6
F1 = 5 F2 = 17 * (N - 1) B = 3
AVGESQ = 1/3 LAMBDA = 6 * N N = 3
Regression using Generic Mode
Simple linear regression
E(Y(i)) = C0 + C1 * X(i)
(N observations at each X(i), i=1,...,p, with mean 0)
LAMBDA = N * (C1**2 * SUM(X(I)**2)) / EVAR
Quadratic regression
E(Y(i)) = C0 + C1 * X(i) + C2 * X(i)**2
LAMBDA=
N * (C1**2 * SUM(X(i)**2)+ 2 * C1 * C2* SUM(X(i)**3
+ C2**2 * SUM(X(i)**4) / EVAR
PC-SIZE G.E. Dallal
PAGE 15
Example 3.5.1 (linear):
ALPHA = 0.001 POWER = 0.995
F1 = 1 F2 = 3 * N - 2
LAMBDA = 17 * N N = 5
Example 3.5.1 (quadratic): H0: C1 = C2 = 0
ALPHA = 0.001 POWER = 0.995
F1 = 2 F2 = 3 * (N - 1)
LAMBDA = 144 * N N = 3
Example 3.5.1 (quadratic): H0: C2 = 0
ALPHA = 0.001 POWER = 0.995
F1 = 1 F2 = 3 * (N - 1)
LAMBDA = 257 * N N = 3
Example 3.5.2 (linear):
ALPHA = 0.025 POWER = 0.95
F1 = 1 F2 = 6 * N - 2
LAMBDA = 1.150 * N N = 14
Example 3.5.2 (quadratic): H0: C2 = 0
ALPHA = 0.025 POWER = 0.95
F1 = 1 F2 = 3 * (N - 1)
LAMBDA = .382 * N N = 41
Multivariate t-test
Example 3.6.1 :
ALPHA = 0.10 POWER = 0.70
F1 = 5 F2 = N - 5
LAMBDA = 1 * N N = 14
Example 3.6.2:
ALPHA = 0.10 POWER = 0.995
F1 = 4 F2 = 2 * N - 5
LAMBDA = .25 * N N = 101
PC-SIZE G.E. Dallal
PAGE 16
ALGORITHMS
PC-SIZE makes use of the following published routines,
modified to run in double precision:
Best, D.J. and D.E. Roberts (1975). Algorithm AS 91. The
percentage points of the chi-squared distribution. Appl.
Statist.,24,385-388.
Bhattacharjee, G.P. (1970). The incomplete gamma integral.
Appl. Statist.,19,285-287.
Cran, G.W., K.J. Martin and G.E. Thomas (1977). Remark
AS R19 and Algorithm AS 109. A remark on algorithms AS
63: The incomplete beta integral, and AS 64: Inverse of
the incomplete beta function ratio. Appl.
Statist.,26,111-114.
Hill, I.D. (1973). Algorithm AS 66. The normal integral.
Appl. Statist.,22,424-427.
Majumder, K.L. and G.P. Bhattacharjee (1973). Algorithm
AS 63. The incomplete beta integral. Appl.
Statist.,22,409-411.
Odeh, R.E. and J.O. Evans (1974). Algorithm AS 70. The
percentage points of the normal distribution. Appl.
Statist.,23,96-97.
and the author's FORTRAN translation of
Pike, M.C. and I.D. Hill (1966). Algorithm 291. Logarithm
of the gamma function. Commun. Ass. Comput. Mach.,9,684.
REFERENCES
Fleiss, Joseph L. (1981). Statistical Methods for Rates and
Proportions, 2-nd ed. New York: John Wiley & Sons, Inc.
Graybill, Franklin A. (1961). An Introduction to Linear
Models, Vol, 1. New York: McGraw-Hill Book Company, Inc.
PC-SIZE G.E. Dallal
PAGE 17
Kendall, Maurice G. and Alan Stuart (1973). The Advanced
Theory of Statistics, Volume 2, 3-rd ed. New York: Hafner
Publishing Co.
Odeh, Robert E. and Martin Fox (1975). Sample Size Choice:
Charts for Experiments with Linear Models. New York:
Marcel Dekker, Inc.
Scheffe, Henry (1959). The Analysis of Variance. New York:
John Wiley and Sons, Inc.
PC-SIZE G.E. Dallal
PAGE 18
SAMPLE SIZE FOR THE TEST OF A NON-ZERO
CORRELATION COEFFICIENT
ALPHA = 0.05
POWER
0.50 0.60 0.70 0.80 0.90 0.95
RHO:
0.05 1536 1959 2467 3137 4198 5192
0.10 384 489 616 782 1046 1293
0.20 96 122 153 193 258 319
0.30 43 54 67 84 112 138
0.40 24 30 37 46 61 75
0.50 15 19 23 29 37 46
0.60 11 13 15 19 24 30
0.70 8 9 11 13 17 20
0.80 6 7 8 9 11 13
0.90 5 5 6 6 8 9
ALPHA = 0.01
POWER
0.50 0.60 0.70 0.80 0.90 0.95
RHO:
0.05 2653 3199 3841 4667 5944 7116
0.10 662 798 958 1163 1481 1772
0.20 165 198 237 287 365 436
0.30 72 87 103 125 158 189
0.40 40 48 57 68 86 102
0.50 25 30 35 42 52 62
0.60 17 20 23 27 34 40
0.70 12 14 16 19 23 27
0.80 9 10 11 13 15 18
0.90 6 7 8 9 10 11
PC-SIZE G.E. Dallal